-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bug: Save original index and remap after function completes #61116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Save original index and remap after function completes #61116
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I'm seeing a performance regression when the index does not contain duplicates. Can we do this conditionally.
I appreciate the review! Can you confirm the performance regression numbers that you are seeing? Note that the timings in the original ticket were from a different machine than the one I'm working on. ASV Benchmarks appear similar enough to be reasonable in the standard case. -- Original Function --
-- Function with my changes --
|
Hey, can I get a re-review on this? |
Thanks for the ping, will get to it in the next day or two. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -61,6 +61,7 @@ Other enhancements | |||
- :meth:`Series.cummin` and :meth:`Series.cummax` now supports :class:`CategoricalDtype` (:issue:`52335`) | |||
- :meth:`Series.plot` now correctly handle the ``ylabel`` parameter for pie charts, allowing for explicit control over the y-axis label (:issue:`58239`) | |||
- :meth:`DataFrame.plot.scatter` argument ``c`` now accepts a column of strings, where rows with the same string are colored identically (:issue:`16827` and :issue:`16485`) | |||
- :meth:`Series.nlargest` uses a 'stable' sort internally and will preserve original ordering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- :meth:`Series.nlargest` uses a 'stable' sort internally and will preserve original ordering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke - there is a change in Series.nlargest
here where previously we weren't using a stable sorting algorithm in certain cases. Now the only dtype I think this can possibly effect is object (as otherwise equal implies identical), and even there it is quite uncommon to have an impact. Perhaps the line needs to make this more clear that it is unlikely to have any impact, or are you thinking it isn't even worth mentioning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha. OK doesnt hurt to include it in the whatsnew notes
7e9a2a3
to
8aa2a2c
Compare
Thanks @Jeffrharr |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Note: I'm new to this project, so this is my first PR.
Saves the index for SeriesNLargest at algorithm start and resets it before returning. This fixes performance issues when the index has many duplicate values.
Results:
Tests
The existing tests should cover this unless we want to add specific tests via the
asv_bench
.Addendum
I also modified the call to sort to use
sort(kind="stable")
to get consistent ordering which is what is currently happening in the equivalent Frame method (it was usingkind=mergesort
which is equivalent tokind=stable
, but kept for portability). I can remove this -- it may be better in another PR.https://numpy.org/doc/stable/reference/generated/numpy.sort.html#numpy.sort